Rule Generation and Rule Selection Techniques for Cost-Sensitive Associative Classification

نویسندگان

  • Adriano Veloso
  • Wagner Meira
چکیده

Classification aims to assign a data object to its appropriate class, what is traditionally performed through a small dataset model such as decision tree. Associative classification is a novel strategy for performing this task where the model is composed of a particular set of association rules, in which the consequent of each rule (i.e., its right-hand-side) is restricted to the classification class attribute. Rule generation and rule selection are two major issues in associative classification. Rule generation aims to find a set of association rules that better describe the entire dataset, while rule selection aims to select, for a particular case, the best rule among all rules generated. Rule generation and rule selection techniques dramatically affect the effectiveness of the classifier. In this paper we propose new techniques for rule generation and rule selection. In our proposed technique, rules are generated based on the concept of maximal frequent class itemsets (increasing the size of the rule pattern), and then selected based on their informative value and on the cost that an error imply (possibly reducing misclassifications). We validate our techniques using two important real world problems: spam detection and protein homology detection. Further, we compare our techniques against other existing ones, ranging from well known naı̈ve-Bayes to domain-specific classifiers. Experimental results show that our techniques are able to achieve a significant improvement of 30% in the effectiveness of the classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

NEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS

Designing an effective criterion for selecting the best rule is a major problem in theprocess of implementing Fuzzy Learning Classifier (FLC) systems. Conventionally confidenceand support or combined measures of these are used as criteria for fuzzy rule evaluation. In thispaper new entities namely precision and recall from the field of Information Retrieval (IR)systems is adapted as alternative...

متن کامل

A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Associative-classification is a promising classification method based on association-rule mining. Significant amount of work has already been dedicated to the process of building a classifier based on association rules. However, relatively small amount of research has been performed in association-rule mining from multi-label data. In such data each example can belong, and thus should be classi...

متن کامل

Eager, Lazy and Hybrid Algorithms for Multi-Criteria Associative Classification

Classification aims to map a data instance to its appropriate class (or label). In associative classification the mapping is done through an association rule with the consequent restricted to the class attribute. Eager associative classification algorithms build a single rule set during the training phase, and this rule set is used to classify all test instances. Lazy algorithms, however, do no...

متن کامل

A Margin-based Model with a Fast Local Searchnewline for Rule Weighting and Reduction in Fuzzynewline Rule-based Classification Systems

Fuzzy Rule-Based Classification Systems (FRBCS) are highly investigated by researchers due to their noise-stability and  interpretability. Unfortunately, generating a rule-base which is sufficiently both accurate and interpretable, is a hard process. Rule weighting is one of the approaches to improve the accuracy of a pre-generated rule-base without modifying the original rules. Most of the pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005